Algorithm for Hierarchical Multi-way Divisive Clustering of Document Collections

نویسنده

  • Kazuaki KISHIDA
چکیده

This paper proposes a novel algorithm of hierarchical divisive clustering, which generates a multi-branch tree, not a binary one, as its output. In order to use the algorithm for clustering large document sets, a spherical kmeans clustering algorithm based on a cosine measure is adopted for partitioning recursively the document set from the top to bottom. Also, by selecting automatically the number of clusters in each partitioning according to a criterion, an optimal multi-way branching is determined for each node of the tree. This paper reports an experimental result indicating the effectiveness of the proposed algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the performance of bisecting K - means and PDDP * Sergio

problem is known as bisecting divisive clustering. Note that by recursively using a divisive bisecting clustering procedure, the dataset can be partitioned into any given number of clusters. Interestingly enough, the clusters so-obtained are structured as a hierarchical binary tree (or a binary taxonomy). This is the reason why the bisecting divisive approach is very attractive in many applicat...

متن کامل

Hybrid Hierarchical Clustering: an Experimental Analysis

In this paper, we present a hybrid clustering method that combines the divisive hierarchical clustering with the agglomerative hierarchical clustering. We used the bisect K-means divisive clustering algorithm in our method. First, we cluster the document collection using bisect K-means clustering algorithm with K’ > K as the total number of clusters. Second, we calculate the centroids of K’ clu...

متن کامل

Cluster Selection in Divisive Clustering Algorithms

The problem this paper focuses on is the classical problem of unsupervised clustering of a data-set. In particular, the bisecting divisive clustering approach is here considered. This approach consists in recursively splitting a cluster into two sub-clusters, starting from the main data-set. This is one of the more basic and common problems in fields like pattern analysis, data mining, document...

متن کامل

Hierarchical Divisive Clustering with Multi View-Point Based Similarity Measure

All clustering methods have to assume some cluster relationship among the data objects that they are applied on. Similarity between a pair of objects can be defined either explicitly or implicitly. In this paper, we introduce a novel multi-viewpoint based similarity measure and two related clustering methods. The major difference between a traditional dissimilarity/similarity measure and ours i...

متن کامل

Hierarchical Clustering in Medical Document Collections: the BIC-Means Method

Hierarchical clustering of text collections is a key problem in document management and retrieval. In partitional hierarchical clustering, which is more efficient than its agglomerative counterpart, the entire collection is split into clusters and the individual clusters are further split until a heuristically-motivated termination criterion is met. In this paper, we define the BIC-means algori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014